Counting generations in birth and death processes with competing Erlang and exponential waiting times

Lymphocyte populations, stimulated in vitro or in vivo, grow as cells divide. Stochastic models are appropriate because some cells undergo multiple rounds of division, some die, and others of the same type in the same conditions do not divide at all. If individual cells behave independently, then each cell can be imagined as sampling from a probability density of times to division and death. The exponential density is the most mathematically and computationally convenient choice. It has the advantage of satisfying the memoryless property, consistent with a Markov process, but it overestimates the probability of short division times. With the aim of preserving the advantages of a Markovian framework while improving the representation of experimentally-observed division times, we consider a multi-stage model of cellular division and death. We use Erlang-distributed (or, more generally, phase-type distributed) times to division, and exponentially distributed times to death. We classify cells into generations, using the rule that the daughters of cells in generation n are in generation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n+1$$\end{document}n+1. In some circumstances, our representation is equivalent to established models of lymphocyte dynamics. We find the growth rate of the cell population by calculating the proportions of cells by stage and generation. The exponent describing the late-time cell population growth, and the criterion for extinction of the population, differs from what would be expected if N steps with rate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda$$\end{document}λ were equivalent to a single step of rate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda /N$$\end{document}λ/N. We link with a published experimental dataset, where cell counts were reported after T cells were transferred to lymphopenic mice, using Approximate Bayesian Computation. In the comparison, the death rate is assumed to be proportional to the generation and the Erlang time to division for generation 0 is allowed to differ from that of subsequent generations. The multi-stage representation is preferred to a simple exponential in posterior distributions, and the mean time to first division is estimated to be longer than the mean time to subsequent divisions.


Methods: multi-stage models of cell division and death with a Markovian framework
We present a multi-stage (MS) model of the time between cell divisions. Cells pass through a sequence of N stages before dividing. The stages are not directly related to the biological phases of the cellular cycle. The time to progress from stage j to the next one, j + 1 , is an exponentially-distributed random variable with mean 1/ (j) . We will refer to these rates, (j) , j = 1, . . . , N , as birth rates. Times to death are also distributed exponentially, with per cell death rate µ . Thus, at each stage, each cell may either proceed to the next one, with probability (j) /( (j) + µ) , or die, with probability µ/( (j) + µ) . The time increment is a random variable, following the exponential distribution with mean 1/( (j) + µ). Figure 1 illustrates the dynamics. Our multi-stage model is equivalent to considering two independent clocks for cell division and death, which compete to decide the cellular fate. The time-to-death clock follows an exponential distribution with rate µ , while the division time follows a continuous phase-type distribution with parameters τ and T 34 . A particular choice of phase-type distribution is the Erlang( , N) , which is a concatenation of N identically distributed exponential steps, where all birth rates are equal: (j) = , j = 1, . . . , N . The case µ = 0 has been considered by Yates et al. 30 .
The number of cells in stage j at time t, is the random variable S j (t) , j = 1, . . . , N . Let M j (t) = E[S j (t)] , be the expected value of S j (t) . The following set of differential equations may be obtained by considering the events that can happen in a short time interval: www.nature.com/scientificreports/ When we extend the MS model to assign a generation to each cell, we refer to the model as the MS-G model. In this way, mean quantities can be compared with CFSE experimental data 8 . Histograms of CFSE intensity display a series of peaks, each corresponding to a generation, or a number of divisions over the course of the experiment 35 .
In the MS-G model, generation g ≥ 0 is split into N g different stages. The notation N g reflects the fact that the number of stages may depend on the generation g. A cell in generation g has to sequentially visit all N g compartments to divide. On the other hand, cells might also die at any stage of the cycle. As depicted in Fig. 2, if a cell belongs to generation g and is in compartment j, j = 1, . . . , N g − 1 , it may proceed to the following stage, with birth rate g , or die with death rate µ g . Again the notation reflects the potential for these rates to depend on the  Each cell in the first stage of generation 0 has to visit all the N 0 compartments (or stages) in order to divide. When cells arrive at the last stage of generation 0, N 0 , they may divide with birth rate 0 , or die with death rate µ 0 . If a cell divides, its daughter cells join the first compartment of the next generation, and the process continues. www.nature.com/scientificreports/ generation. When a cell reaches the last stage, N g , of generation g and divides, its two daughters will join the first compartment of generation g + 1 . In summary, given a cell in generation g, its time to division follows an Erlang distribution with parameters ( g , N g ) , whereas its time to death follows an exponential distribution with rate µ g . These distributions correspond to two independent competing clocks to control cellular fate, similarly to those considered in Fig. 1. The number of cells in stage j of generation g at time t is the random variable S g j (t) , g ≥ 0 , j = 1, . . . , N g . Let M g j (t) = E[S g j (t)] be the expected value of S g j (t) . The following set of differential equations may be obtained by considering the events that can happen in a short time interval: We are interested in computing the mean number of cells over time for the MS and MS-G models. Specifically the MS-G model will provide the mean number of cells in each generation, and thus, can be used together with CFSE data to obtain division and death rates. When division times are Erlang distributed (MS model), or if one considers that those Erlang distributions are identical across generations (MS-G model), it is possible to carry out a comprehensive analytical study. This is shown in the "Analytical results" section.
When convenient analytical solutions cannot be obtained, (2.1)-(2.2) can be solved numerically in different ways. For example, for the MS-G model, and keeping in mind our interest in modelling CFSE data, we assume there exists a maximum generation G that can be measured by the dye. Thus, one might be interested in following cells within generations g = 0, . . . , G . For these generations, Eq. (2.2) can be solved by making use of the matrix exponential. To this end, let M(t) be the column vector of the mean number of cells in each stage and generation as time evolves, i.e., which has length G g=0 N g , and where the column sub-vectors M g (t) contain the mean number of cells across stages in generations g = 0, . . . , G . Let us also define the coefficient matrix where A gg is a square N g × N g matrix, whereas A g,g−1 is a N g × N g−1 matrix. A is then a real square matrix of dimension G g=0 N g , and 0 a×b represents a null matrix with dimension a × b . Given the vector of the initial conditions n 0 , which has length G g=0 N g , the system of Eq. (2.2) can be rewritten as the following Cauchy problem The solution of the system is given by M(t) = e At n 0 , where represents the matrix exponential. For efficient ways of computing this matrix, see Refs. [36][37][38][39] . Finally, we note that since CFSE data describe the number of cells in each generation, one can then compute the mean number of cells in each generation over time as  29 , a cell's time to division is a gamma-distributed random variable, and time to death is exponentially distributed. Solutions are given in terms of integral equations. Here, with Erlang-distributed division times, we find a set of linear differential equations for the expected number of cells in each stage.

Analytical results
In this Section, we show how the Markovian framework of the proposed multi-stage models provides analytical tractability under some simplifying assumptions. Our aim is to compute the mean number of cells in each stage and generation over time, especially the limiting behaviour as t → +∞.

MS model with Erlang division time.
In this Section, we consider a simple case of the MS model, where identical birth rates are assumed across different stages; that is, (j) = , j = 1, . . . , N . The phase-type distribution for the time to division in Fig. 1 is Erlang( , N) and the mean time to division is given by N . Note that when N = 1 the MS model becomes a Markov linear birth-and-death process, with birth rate, , and death rate, µ . Equation (2.1) becomes As in Yates et al. 30 , we introduce the new variables m j (t) = e ( +µ)t M j (t) , j = 1, . . . , N , which satisfy the following ODEs: We find an Nth-order homogeneous differential equation for m N (t) that does not depend on µ: Therefore, the expected total number of cells in the population at time t, M(t), is given by   where r = µ µ+ . When N = 1 , p 1 = µ ; when N = 2 , p 1 = µ 2 +2µ 2 . The analytical solutions (3.5)-(3.6) provide another route to study the limiting behaviour as t → +∞ . The largest term in the summation of (3.5) is the one corresponding to k = 0 . The combination (2 1/N − 1) − µ , is positive if µ < (2 1/N − 1) , negative, when µ > (2 1/N − 1) , or zero if µ = (2 1/N − 1) . Thus, extinction of the cell population is certain if µ > (2 1/N − 1) . Figure 3 (centre) shows an example of extinction when N = 5, = 0.5, µ = 0.1 and the initial number of cells C 0 = 10 2 . An example of behaviour when µ = (2 1/N − 1) is shown in the left panel of Fig. 3.
If µ < (2 1/N − 1) then, as t → +∞, and which is illustrated in Fig. 3 (right). The total population size is easily obtained using As t → +∞, (3.7)   Fig. 4, is lower than would be expected if N steps with rate were equivalent to a single step of rate /N . As N → +∞ , we have Nσ N → log 2 . In terms of (3.10), σ N < /N since M N (t) < 1/N as t → +∞ . Because the cell population is unevenly distributed across stages, with a bias towards earlier stages in the long run, N steps with rate are not equivalent to a single step of rate /N.

Mean fraction of cells at each stage.
We define the mean fraction of cells in each stage, P j (t) , as the ratio between the mean number of cells in compartment j and the expected total number of cells in the population, i.e., We make use of (3.1) and (3.10) to write, which have the following steady state solution One observes that P * j < P * j−1 , j = 1, . . . , N − 1 , which means (on average) the fraction of cells decreases stage by stage, independently of the initial distribution of cells. In fact, one can solve (3.14) to determine P * j , as follows which does not depend on or µ . Thus, at late times the fraction of cells in each stage only depends on the number of stages considered; the parameter sets the timescale of the dynamical system, and all cells are equally susceptible to death, regardless of the stage they are in.

MS-G model with identical Erlang division times across generations.
The solutions of the system (2.2) can be written in a closed analytical form in particular cases. For example, one may consider a simplified scenario where the number of stages is equal to 1 for all the generations, i.e., N g = 1 for all g ≥ 0 . Then, if we consider that at time t = 0 , there are C 0 cells in generation 0, so that n 0 = (C 0 , 0, . . . , 0) , this leads to the following solutions: In this case the MS-G model becomes a birth-and-death process tracking cell generations, and becomes identical to that considered in Refs. 27,[40][41][42] , where the inter-event times of cell death and division are modelled as exponential random variables, rather than Erlang distributions. www.nature.com/scientificreports/ In this Section we consider the case with identical number of stages, N, and rates, and µ , for each generation, so that division times are Erlang-distributed in each generation. Under these assumptions, it is possible to obtain an analytical expression for the mean number of cells in each generation. Then Eq. (2.2) becomes These equations can be rewritten in terms of the new variables m g This is equivalent to multiplying (3.17) by the integrating factor e ( +µ)t . Thus, (3.17) becomes To determine the solutions of (3. From the previous equations, one can show that since cells in each generation and compartment either proceed to the next stage within their generation, divide (proceeding to the next generation), or die. Once the mean number of cells in each compartment for a given generation is at hand, the expected number of cells in each generation can be determined according to (2.3). We can write This equation is consistent with the results of the exponential model ( N = 1) 27 . On the other hand, if one is interested in the mean number of cells in each compartment, M j (t) for j = 1, . . . , N , regardless of the generation they belong to, this can be computed as follows for j = 1, . . . , N and t ≥ 0 . In practice, one could truncate the series above to get an approximation of the mean number of cells in each stage. However, we note that one can use instead the solution provided by (3.5), since the dynamics of the MS-G model is equivalent to the dynamics of the MS model, when the parameters N, and µ are generation-independent. It can be numerically checked that this indeed provides equivalent results. In fact, when N = 1 or N = 2 , one can analytically show the equivalence. In the former case ( N = 1 ), it is enough to recall the power series of the exponential function. In the latter case ( N = 2 ), we derive from (3.5) Comparison between the MS-G model and the cyton model. The cyton model is a stochastic model proposed to describe the population dynamics of B and T lymphocytes 13 . Division and death times are regulated by two independent clocks, and the competition between both clocks determines the fate of the cell. When a cell divides, these clocks, which depend on the number of divisions the cell has undergone, are reset for each daughter cell. However, when analysing an in vitro experiment with this type of cells, there is evidence that not all cells either divide or die. For instance, a portion of them may not respond to the stimulation 43 , or may respond without division 44 . This is the reason why a progressor fraction is defined in the cyton model. This progressor fraction represents for a given generation, the fraction of cells that are capable of undergoing further division. Each clock is described by a probability density function, and the parameters that define these probabilities are the free parameters in the model. Right skewed distributions, such as log-normal or gamma, are usually adopted to characterise the two independent clocks that regulate cell division and death. In summary, the cyton model is based on the following assumptions: • death and division are random events, characterised by a probability density function for the time to divide or die, respectively, • these processes are independent, and compete to determine the fate of the cell, • the clocks responsible for these processes are reset when a cell divides, • only a fraction of the cells in each generation are capable to undergo further divisions, and • the machineries that regulate cellular fate depend on the cell's generation.
In order to translate these assumptions into mathematical terms, let γ g be the progressor fraction characterising cells having undergone g divisions, and let φ g (·) and ψ g (·) represent the probability density functions for the time to division and death, respectively, for cells in generation g. The number of cells dividing for the first time, or dying, per unit time at time t ≥ 0 can be calculated, respectively, as 13 : where C 0 is the initial number of cells in the population. Consequently, the time evolution of the expected number of cells in generation 0, M 0 (t) , obeys the differential equation The number of cells in generation g dividing, or dying, per unit time at time t can be computed, respectively, as Hence, the dynamics of the average number of cells in each generation, M g (t) , is governed by the differential equations In the next sections we show how the cyton model is equivalent to our model for particular choices of the probability density functions of the division and death clocks, φ g (·) and ψ g (·) , and the progressor faction γ g .
(3.23) n div (3.28) d M g (t) dt = 2n div g−1 (t) − n div g (t) − n die g (t), g ≥ 1. www.nature.com/scientificreports/ Exponential time to division and death. We consider here the MS-G model with number of stages across generations equal to one, i.e., N g = 1 for all g ≥ 0 . This means that cells in generation g divide after an exponentially distributed time with rate g , and die with rate µ g . We note that this is different to a standard Markov birth-anddeath process, since rates are generation-dependent. Equation (2.4) become In this case, our model is equivalent to the cyton model with exponential times for division and death, and progressor fraction γ g = 1 , g ≥ 0 . One can show this equivalence by proving that n div g (t) = g M g (t) and n die g (t) = µ g M g (t) , by induction on g. In the cyton model, the assumption of exponential time to division and death implies that φ g (t) = g e − g t and ψ g (t) = µ g e −µ g t , g ≥ 0 . Therefore, according to (3.23) and (3.24), the number of cells at time t dividing for the first time or dying to exit generation 0 per unit time is given by We know from (3.16) that M 0 (t) = C 0 e −( 0 +µ 0 )t . Therefore, we can write n div 0 (t) = 0 M 0 (t) and n die 0 (t) = µ 0 M 0 (t) , which proves the case g = 0 . We assume n div g (t) = g M g (t) and n die g (t) = µ g M g (t) hold for generation g and we need to show they also hold for generation g + 1 . We make use of (3.16) and (3.26) to write For the number of cells in generation g + 1 dying, Eq. (3.27), together with (3.16) lead to which concludes the proof. With the identities n div g (t) = g M g (t) and n die g (t) = µ g M g (t) in (3.25) and (3.28), one can show that M g (t) and M g (t) obey the same differential equations for all g ≥ 0 . Thus, the two models are equivalent.
Erlang time to division and exponential time to death. We now consider the more interesting case where the number of stages in each generation is greater than one, and the cell cycle can be described as a multi-stage process. We focus here on the case where identical number of stages N and birth and death rates, and µ , respectively, are considered across generations. Similarly to the previous case, we prove that n div g (t) = M g N (t) and n die g (t) = µM g (t) by induction on g. Since a cell's time to division is Erlang distributed and a cell's time to death is exponentially distributed, ψ g (t) = µe −µt for all g ≥ 0 and where the progressor fraction is again set to 1 for each generation. Note that in this case the parameters in φ g (·) and ψ g (·) are independent of the generation g, since the number of stages and the birth and death rates are identical for all generations. From (3.23) and (3.24), the number of cells dividing for the first time or dying to exit generation 0 per unit time at time t is The dynamics of the expected number of cells in generation 0 is given by (3.25), as in the previous case. Using (3.20) and (3.21), we observe that n div www.nature.com/scientificreports/ Therefore, n div 0 (t) = M 0 N (t) and n die 0 (t) = µM 0 (t) , which concludes the case g = 0 . We make use of these identities in (3.25) to obtain which is the differential equation derived in (2.4) for M 0 (t) . Now, let us suppose that the identities n div g (t) = M g N (t) and n die g (t) = µM g (t) hold for generation g and we prove them for generation g + 1 . Using (3.26) and the induction hypothesis, we have where we have used (3.20) for the last step. The same arguments can be used to look at the number of cells in generation g + 1 dying per unit of time, (3.27). Together with the induction hypothesis, we can write where the last identity was obtained making use of (3.21). Hence, (3.28) becomes which is identical to (2.4) for M g (t) , g ≥ 1 . This concludes the proof of the equivalence between the cyton model and the multi-stage model with generations when a cell's time to divide is Erlang distributed with parameters and N, and a cell's time to die is exponential with rate µ . In summary, the analysis presented in this section for the multi-stage model with Erlang division time and exponential death time leads to exact closed solutions for the cyton model with the previous choice of clocks.

Case study: lymphopenia-induced proliferation
In this Section we illustrate the applicability of the MS-G model to CFSE data, making use of an experimental study of lymphopenia-induced proliferation 20 . In particular, we compare the performance of the MS-G model to that of a simple exponential (or single stage) model with generations, which is equivalent to making N g = 1 for all g in the MS-G model. Differences in T cell proliferation have been observed to vary between different T cell clonotypes (i.e., the set of T cells with the same T cell receptor). Hogan et al. 20 transferred CFSE-labelled OT-I or F5 T cells intravenously to lymphopenic mice. A certain number of days (3, 4, 5, 6, 7, 10, 12 and 18 days) after the transfer, spleens and lymph nodes were recovered from the mice and analysed by flow cytometry to quantify the expression levels of CD8, CD5, CD44, and CFSE dilution 20 . For each time point, the number of mice analysed was between 3 and 7. We note that two independent transfer experiments, carried out under identical conditions, were performed: one for OT-I cells and a second one for F5. In Fig. 5 both data sets are shown: for each time point the number of cells is plotted for each mouse and generation (identified via the CFSE dilution measurement). On the left (right), OT-I (F5) cells are represented by the green (blue) histograms. In order to infer model parameters, we will consider all cells which have divided five or more times as a single class, denoted 5+ . This is similar to the approach considered in Refs. 4,29,35 . The rationale behind this choice is to reduce errors in the quantification of labelled cells with low CFSE fluorescence, as is the case for five or more divisions. Figure 5 clearly shows that OT-I T cells proliferate faster than F5 cells, so that by day 7 there are OT-I cells in generation 10, whereas for F5 cells the maximum generation observed at day 7 is 6. This greater proliferative capacity of OT-I cells eventually leads, after one week, to competition for resources (e.g., IL-7 cytokine) and the www.nature.com/scientificreports/ OT-I population approaching its carrying capacity 20 . Since our model does not account for competition, it can only appropriately describe the dynamics of OT-I cells during the first week of the experiment. Thus, for OT-I cells we will only make use of the data set up to that time (one week). Yet for the F5 population we will use the entire data set. In Hogan et al. 20 this competition was incorporated with a density-dependent birth rate, (P) , as follows where ¯ is the rate of growth under unlimited resources, δ the size of reduction caused by the expansion of competing cells, and P is the size of the population 20 . Figure 6 shows the density-dependent birth rate, (P) , as a function of the population size P. It suggests that the competition for resources is greater in the case of OT-I T cells. In the experiments the number of OT-I cells after one week (about 5 × 10 5 ) is larger than the population . Therefore, the population of F5 T cells never reaches its carrying capacity and the role of competition for resources can be neglected. We estimate model parameters with the ABC-SMC algorithm 33 . Thus, the posterior distribution of the parameters is obtained by T sequential applications of the ABC algorithm, where the posterior obtained in each iteration is used as prior for the next one. This algorithm requires the definition of prior distributions for the first iteration, a distance function, a tolerance threshold for each iteration, and a perturbation kernel 33 . We assume all parameters are initially distributed according to a uniform prior distribution, as described in Table 1. When a prior distribution spans several orders of magnitude, the uniform distribution is taken over the exponent to efficiently explore parameter space. Given x g D (t) , the experimentally determined mean number of cells in generation g at time t, for g ∈ {0, 1, 2, 3, 4, 5+} , and its corresponding model prediction, x g M (t) = M g (t) for a particular choice of parameters θ = (C 0 , N 0 , N, 0 , , α) , the distance function is defined as where T is the set of time points and depends on the clonotype of interest, σ g D (t) represents the standard deviation of the experimental data at time t and generation g, and G is the merged (and maximum) generation, G = 5+ . In practice, we define the first tolerance threshold, ε 1 , in the ABC-SMC algorithm as the median value of the distances obtained from 10 4 preliminary realisations, with the parameters sampled from the prior distributions in Table 1. The subsequent tolerance thresholds, ε j , j = 2, . . . , T can be then defined as the median of the distance values obtained from the previous iterations of the algorithm. Finally, we use a uniform perturbation kernel to perturb the parameters during the sequence of iterations 33 , and implement the algorithm for T = 16 in the case of the multi-stage model and T = 7 for the single stage one.
Before performing the Bayesian inference, we make some assumptions based on the experimental set up. Several studies have shown that the time to a first division is larger the time to subsequent divisions, since cells require time to become activated before they divide 13,15,16 . Thus, we assume that all generations but 0 are comprised of the same number of stages N, whereas generation 0 is characterised by N 0 stages. Similarly, cells in generation 0 proceed to divide with birth rate 0 , whilst all the other generations have a birth rate . Therefore, in contrast to the inference in 29 , the number of stages N 0 and N are free parameters in the model. On the other Figure 6. Density-dependent birth rate, (P) , as a function of the population size, P. The parameter ¯ , with units of cell · day −1 , represents the rate of growth under no competition and δ quantifies the level of reduction caused by the expansion of competing cells. Values for ¯ (shown in the inset) and δ = 6.0 × 10 −6 are taken from 20 , Table 1]. Table 1. Prior distributions for model parameters. Units for 0 , and α are inverse hours ( h −1 ).

Model parameters
Description Prior distribution Scientific Reports | (2022) 12:11289 | https://doi.org/10.1038/s41598-022-14202-0 www.nature.com/scientificreports/ hand, we propose that the per cell death rate in a given generation is linear on the number of cell divisions that the cell has undergone 35,45 . We write where α is a parameter to estimate. These linear death rates encode the fact that cells are more likely to die when they have already undergone several divisions 35,45 . Finally, the initial number of cells, C 0 , is considered a parameter to be estimated, since the actual number of transferred cells which make it to the lymph nodes or spleen cannot be measured.   Fig. 7. We run the model with the parameters being sampled from the estimated posterior distributions and compute the median of all the simulations, which corresponds to the solid magenta (multi-stage model) and turquoise (exponential model) lines in Fig. 7. The bands around median predictions represent 95% confidence intervals. Data points are plotted with the standard deviation from the multiple experimental replicates. As shown in Fig. 7, Table 2. Despite the two extra parameters, the values of AIC C corresponding to the multi-stage model are significantly lower for both clonotypes. Overall, the MS-G model is able to explain the data from the OT-I transfer experiment better, since this data set is less noisy than the F5 set.
The marginal posterior distributions for each parameter are shown in green and blue in Figs. 8 and 9, for the multi-stage and exponential models, respectively, and the (uniform) prior distributions are plotted in red. Summary statistics of these posterior distributions are shown in Tables 3, 4, 5 and 6. Cell death is governed by     www.nature.com/scientificreports/ the parameter α , and is estimated to be low for both models and clonotypes, suggesting that cell death does not have a significant impact on the dynamics during lymphopenia, which is in fact dominated by cell division. This result is in agreement with Hogan et al. 20 , where the death rate is assumed to be zero. The initial number of cells can be estimated with relative success, and does not seem to depend heavily on the model considered. On the other hand, cell division is governed by parameters (N 0 , 0 , N, ) , with N 0 = N = 1 in the exponential model.   www.nature.com/scientificreports/ We note that in both models, N 0 0 and N represent the mean time to the first and subsequent divisions, respectively. Although all division-related parameters can be estimated from the data, for both models and clonotypes, a correlation between the division rate and the number of stages is seen in the scatter plots of Fig. 10. Instead of plotting the marginal posterior distributions for these parameters, one can consider the posterior distribution for the mean times N 0 0 and N (see Fig. 10). The fact that N = 1 is never chosen as an accepted parameter value in the posterior distribution for the multi-stage model and the OT-I clonotype already suggests that a multi-stage representation of cell division is preferred for this clonotype. On the other hand for the F5 clonotype the marginal distribution for N shows a non-zero frequency for the value 1, but larger values of N are also represented in its posterior distribution. The mean time to both first and subsequent divisions, N 0 0 and N , are significantly longer for the F5 clonotype than the OT-I. In fact, our results estimate that F5 T cells divide slowly compared to OT-I cells, requiring on average 192 h to carry out a first division (59 h taken by OT-I T cells), as shown in Fig. 10 for the multi-stage model. The time to subsequent divisions is represented by the blue histograms. Interestingly, our estimation of the mean time to first division of OT-I cells, on average 59 h, is close to the value obtained by Hogan et al. 20 (52 h when considering the best fit parameter estimates). In the case of F5 cells, we predict an average of 192 h to undergo their first division, whereas Hogan et al. obtained a value of 137 h. We note that the value 137 h is within the range covered by our predicted posterior distribution.
Our results indicate that OT-I T lymphocytes require on average 59 h for their first division, and a bit less, 46 h, for subsequent divisions (see upper left plot of Fig. 10). Based on our Bayesian approach, we conclude that a multi-stage model with a constant division rate after the first division event, is a suitable description of lymphopenia-induced proliferation 5,29,40 . The MS-G model estimates that F5 cells take on average slightly less than 200 h to divide, both for the first or subsequent division rounds, as shown in the lower left plot of Fig. 10. This difference can be explained by the different characteristics of OT-I and F5 T cells, and was previously observed 20 . The posterior distributions of the expected time to subsequent divisions in the MS-G model, N , and in the exponential one 1 , shown as blue histograms in Fig. 10, indicate that the exponential model predicts a longer division time than the multi-stage model for both clonotypes. This can be justified by the implementation of the ABC-SMC algorithm. Indeed, when parameterising the exponential model, the algorithm tries to keep the distance between the model predictions and the experimental observations low. This leads to the choice of parameter sets which limit cell proliferation, as shorter division times in the exponential model would lead to an increase in cell numbers not observed in the data set, and thus larger distance values. This is why the estimated birth rates in the exponential model are lower than the ones in the multi-stage representation. As a result, the exponential model predicts a greater average division time than the multi-stage model for both clonotypes. Finally, our results indicate that for both clonotypes the exponential model (see Fig. 10) found a shorter time to first division than to subsequent ones, contradicting previous findings 13,15,16 , which support longer first division times. This is related to the fact that, overall, the exponential model is not able to capture the observed cell dynamics for neither of the clonotypes, as can be seen in Fig. 7.

Discussion
We analyse a multi-stage model of cell proliferation and death, tracking cell generations, in a framework that retains the benefits of a Markov process. With particular choices of rates, the models are equivalent to others in the literature 13,27,29,[40][41][42] . In the case study of "Case study: lymphopenia-induced proliferation" section, the MS-G model performs better than the exponential model of time to division. The model implemented here provides a flexible framework for estimating the birth and death rates that describe the dynamics of lymphocyte populations 48,49 . The representation retains the advantages of a Markovian approach, including analytical tractability in some cases, and computational efficiency of numerical simulations with the Gillespie algorithm 50,51 . The expected number of cells in each generation satisfy a set of linear differential equations. Further comparison of this and of published models 13,27,29,[40][41][42] with different experimental datasets is the aim of future work.
It has been observed 13,15,16 that immune cells typically need longer to divide for the first time, whereas later divisions require shorter times 5 . It is possible to assume that divided and undivided cells have different probability densities of time to cell division in exponential and Smith-Martin models 7,35,45 . With the multi-stage model introduced here, the separation need not be explicit because it is incorporated in the generation-dependent parameters. A longer mean time to first division, N 0 0 , than mean time to subsequent divisions, N , is a natural part of the framework. Extension of the mathematical analysis in "Analytical results" section to the case 0 = , N 0 = N and possibly generation-dependent death rate µ g , g ≥ 0 , would be desirable.
Our calculations rely on the assumption that cells are independent of each other. In particular, no fate correlation is assumed between daughter cells and their progenitors, or between siblings. However, data sets from time-lapse microscopy of B and T cell families [10][11][12]14,16,17,25 show that division and death times for siblings are correlated, and "division destiny" is a familial characteristic 24 . A further potential extension of the MS-G model is the introduction of a population carrying capacity. In the model as described in "Analytical results" section, the mean number of cells over time either increases without bound, dies out or reaches a steady-state, depending on the relation between division-related parameters (birth rate and number of stages in the cell cycle), and the death rate. Competition for resources may be modelled using density-dependent birth and/or death rates 20,52 , or by rates that depend on the time-dependent availability of resources 53 .